Entry Name:  Purdue-Hatton-MC2

VAST Challenge 2015
Mini-Challenge 2

 

 

Team Members:

William Hatton, United States Air Force Academy, C16william.hatton@usafa.edu             PRIMARY

Jieqiong Zhao, Purdue University, zhao413@purdue.edu

Mahesh Babu Gorantla, Purdue University, mgorantl@purdue.edu

Junghoon Chae, Purdue University, jchae@purdue.edu

Benjamin Ahlbrand, Purdue University, bahlbran@purdue.edu

Hanye Xu, Purdue University, xu193@purdue.edu

Siqiao Chen, Purdue University, chen1722@purdue.edu

Guizhen Wang, Purdue University, wang1908@purdue.edu

Jiawei Zhang, Purdue University, zhan1486@purdue.edu

Abish Malik, Purdue University, amalik@purdue.edu

Sungahn Ko, Purdue University, ko@purdue.edu

David Ebert, Purdue University, ebertd@purdue.edu

 

Student Team:  No.

Did you use data from both mini-challenges?  Yes.

 

Analytic Tools Used:

Tableau, R, MS Excel, our custom designed system

 

Approximately how many hours were spent working on this submission in total?

100 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? Yes

 

Video Download

Video:

http://pixel.ecn.purdue.edu:8080/~zhan1486/VASTCHALLENGE15/MC2.wmv

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1Identify those IDs that stand out for their large volumes of communication.  For each of these IDs

 

      a.      Characterize the communication patterns you see.

      b.      Based on these patterns, what do you hypothesize about these IDs?

 

Limit your response to no more than 4 images and 300 words.

 

The data shows two IDs with an abnormally large amount of communication. We use our system and sort the IDs in descending order based on the total number of messages. This process is shown below:

Figure 1: Operation flow of analysis: (1): choose specific time range; (2): view the user list ordered by communication volume; (3): select a user and examine his/her detailed time-series pattern.

We find that the two IDs that stand out for their large volumes of communication and their patterns are as follows:

1.      ID 1278894

a.       At 12:00 PM each day, this person sends out a large group message. The person then sends another message to around the same amount of people every 5 minutes until the next hour (1:00 PM), and then waits an hour to begin again. The person continues the 5 minute interval messages for an hour followed by an hour break through 9:00 PM (the last message is sent at 8:55 PM). On Friday and Saturday, the amount of messages that this person sends at 2:55 PM and 4:00 PM dips, likely due to the second performance of the day by Scott Jones.

b.      We can hypothesize that this person is a park employee simply sending out information to all the park participants. Since the amount of messages the person sends seem to fluctuate with the amount of people and the events of the park (Scott’s second performance in particular), we can theorize that this person sends messages to park visitors that are not currently checked-in with information about which attractions are open or have small lines and/or other general park information.

2.      ID 839736

a.       This person does not have much of a pattern with its messages, but he/she sends between 5 and 20 messages every minute of each day. The only exception to this occurs at 12:00 on Sunday, when there is a huge spike of up to 1400 messages that slowly decline back to his/her normal over the next 45 minutes.               

b.      We hypothesize this person is also a park employee who deals with safety and security issues due to the constant activity throughout the weekend. Also, the spike in communication on Sunday would likely occur from the crime involving Scott Jones, which this person would be responsible to mitigate for.

 

Figure 2: Communication patterns of top two users who had high message volume (user 1278894 and user 839736).

MC2.2Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.

Limit your response to no more than 10 images and 1000 words.

 

The patterns we were able to find in the communications data are described below:

1.       There is a spike in communications from Coaster Alley at 11:00 AM and 4:00 PM on Friday and Saturday. Sunday only shows a spike at 11:00 AM. Scott’s show begins at 10:00 AM and 3:00 PM, so the likely flurry of activity from Coaster Alley, where the performance stage is located, indicates the end of Scott’s shows. The “from” data spike is accompanied by a great increase in messages to the Coaster Alley as well. This could be the result of people responding quickly to all the messages sent from the location, but is more likely the result of people communicating with their friends that just saw the show, or are trying to find other friends after possibly being separated in the stage area or those who did not attend the show.

 

Figure 3: Communication data over time for Coast Alley for Friday (Left), Saturday (Middle), and Sunday (Right).

 

2.       Small groups generally communicate with other small groups. People who communicate frequently are clustered as a cluster in our tool. By looking at the IDs in the cluster, we find that many of them also travel together. In fact, there are several clusters of 7 or 8 people that contain smaller groups of 2-3 people who travel together, but are communicating with everyone in the cluster. This could be from small groups breaking off from a bigger group while in the park or just people who meet new friends and talk to them throughout the day.

 

 

Figure 4: A cluster of people that travel separately, but still communicate amongst themselves. (Left) Clustering individuals based on communication data on Friday. (Middle) Clustering these individuals based on their travel sequence patterns. (Right) List of IDs in the selected cluster.

3.       People who travel together (based on our attraction attendance clustering method) almost always communicate by messaging. Although it seems unusual that people standing right next to each other would communicate with messaging, this is likely due to these groups communicating with each other when they get separated, or if they visit different attractions during the day. By studying these clusters, we find that many people who travel together are also grouped together by their frequent communicating. Also, the time series graphs of the total to and from messages across the different locations look very similar. These results confirm that people are sending messages to others travelling with them and located in the same region of the park.

4.       The most messages sent to an external person come from the Wet Land during each of the three days. Such a phenomenon makes the most sense for Sunday, since people are likely to message people they know outside the park about the crime that occurred as they are observing the happenings around the crime scene. Also, people are likely to contact outsiders about the police investigation going on that day. The reason for the popularity of sending external messages on Friday and Saturday may be from people sending picture messages and/or descriptions of Scott’s memorabilia to people they know who are not at the park.

 

Figure 5: Communication traffic by region to external people on the three days of the event.

 

5.       The Wet Land is also the most popular area for people in the park to send and receive messages. After removing the heavy afternoon flux of messages from and to ID 1278894 at the Entry Corridor, Wet Land is the leader in messages sent throughout the day. This is evident in the image below showing our filtering out of the single ID for the total messages time graphs. Although there are several rides in Wet Land, the cause for its heavy role in communications is likely the result of the Beer Garden located there, and, of course, the pavilion. Most people visit the pavilion at some point in the day and are likely to communicate about the interesting memorabilia they are observing with the other members of their travelling group and their messaging group (communication cluster).

 

Figure 6: Communication traffic over time on Sunday (Top), Saturday (Middle), and Friday (Bottom). We find that the Wet Land is the most popular area for people to communicate.

 

6.       Although there is a spike in Coaster Alley communications at the end of Scott’s shows (Pattern 1), there is a lull in communication in all areas of the park during the shows. There are fewer people in the park and so it is less evident for 10:00-11:00, but it is quite clear by examining the time series graph for sent messages for 3:00-4:00 that each day (except Sunday afternoon) at these times the communications in the park drop in each region. It is likely that the drop occurs because a vast number of people in the park move to the performance stage to watch Scott’s show, and once the show starts, the people watching the show cease their communication.

 

Figure 7: The number of (distinct) people who send/receive messages over time on the three days.

7.       From the above time series graph, we can infer that the entire park operations take full swing at around 11am which is also the time at which athlete gives his first speech of the day. Analyzing the peaks we found that at 12:00pm every day, a person with ID: 1278894 who is assumed to be a Park Official sends messages to almost every person in the park every five minutes for an hour and then waits for an hour to begin again. This pattern continues until 9pm every day.

8.       The messages sent by the park employees (IDs 1278894 and 839736) are apparently responded to and people enter into conversations with these administrators. The increase in messages sent from the Entry Corridor in the afternoon by ID 1278894 corresponds to an increase in messages to the Entry Corridor during each hour the employee sends out messages. Also, by looking at individual ID communication data, as shown below, we see people send messages back and forth with ID 839736. The spike of messages from the Entry Corridor at 12:00 on Sunday from ID 839736 is also paralleled by a spike of messages in other areas in response. These park employees are clearly interacting with the visitors more than simply just distributing general information.

 

Figure 8: Number of messages sent from the entry corridor location (Left) and received from the different locations (Right) for the three days.

MC2.3From this data, can you hypothesize when the crime was discovered?  Describe your rationale.

 

Limit your response to no more than 3 images and 300 words. 

 

The information about the total messages sent in the park show how actively people are communicating at the observed times. To study this across the three days in an attempt to find unusual patterns possibly related to the crime, we used time series graphs for total messages from park visitors on each day of the weekend. These graphs indicate similar patterns throughout each day, except between 11:30 AM and 12:15 PM on Sunday. On this day, it appears that there is a spike in communications in several different areas of the park. At first, the spike occurs in Wet Land, which reaches an unusual number of messages for that area.

 

Figure 9: Communication patterns on Sunday for Wet Land.

When observing the map, we see the pavilion entrance is located in Wet Land. Since the crime occurred at the pavilion, we can conclude the crime was discovered at around 11:30 AM and the next half hour flurry of communications from that region was the result of the discovery.

 

Also, with the use of the heat map feature in our tool, the movement data shows there are no check-ins to the pavilion after 12:00 PM. At 12:00 PM, we see a large spike in communications from the Entry Corridor. Since IDs 839736 and 1278894 (the only known park employees sending messages) send all their messages from the Entry Corridor, we can determine they likely sent messages notifying visitors the pavilion was closed at that time. Thus, since the pavilion was closed for the police to conduct their investigation (as stated in the news article), we further confirm that the crime occurred right before 12:00 PM and was discovered at 11:30 AM, the beginning of the communication spike.

 


 

Figure 10: No check-ins are observed in the pavilion after 12:00pm on Sunday.